IngestThis Logo
BLOG
COMMUNITY
PODCAST

Tag: data engineering

2026-02-19 • Alex Merced

How to Think Like a Data Engineer

The median lifespan of a popular data tool is about three years. The tool you master today may be deprecated or replaced...

2026-02-19 • Alex Merced

How to Design Reliable Data Pipelines

Most pipeline failures aren't caused by bad code. They're caused by no architecture. A script that reads from an API, tr...

2026-02-19 • Alex Merced

Data Quality Is a Pipeline Problem, Not a Dashboard Problem

When an analyst finds null values in a revenue column, the typical response is to add a calculated field in the BI tool:...

2026-02-19 • Alex Merced

Idempotent Pipelines: Build Once, Run Safely Forever

A pipeline runs, processes 100,000 records, and loads them into the target table. Then it fails on a downstream step. Th...

2026-02-19 • Alex Merced

Schema Evolution Without Breaking Consumers

A source team renames a column from `user_id` to `customer_id`. Twelve hours later, five dashboards show blank values, t...

2026-02-19 • Alex Merced

Batch vs. Streaming: Choose the Right Processing Model

"We need real-time data." This is one of the most expensive sentences in data engineering — because it's rarely true, an...

2026-02-19 • Alex Merced

Partition and Organize Data for Performance

A table with 500 million rows takes 45 seconds to query. After partitioning it by date, the same query — filtering on a ...

2026-02-19 • Alex Merced

Testing Data Pipelines: What to Validate and When

Ask an application developer how they test their code and they'll describe unit tests, integration tests, CI/CD pipeline...

2026-02-19 • Alex Merced

Pipeline Observability: Know When Things Break

An analyst messages you on Slack: "The revenue numbers look wrong. Is the pipeline broken?" You check the orchestrator —...

2026-02-19 • Alex Merced

Data Engineering Best Practices: The Complete Checklist

Best practices documents are easy to write and hard to use. They list principles without context, advice without priorit...

2026-02-13 • Alex Merced

A 2026 Introduction to Apache Iceberg

An updated introduction to Apache Iceberg...

2025-12-29 • Alex Merced

2025 Year in Review Apache Iceberg, Polaris, Parquet, and Arrow

A look back at key developments in Apache Iceberg, Polaris, Parquet, and Arrow in 2025....

2025-01-06 • Alex Merced

RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

Why retrieval-augmented generation systems fail in enterprises—and what to do about it....

2025-01-02 • Alex Merced

Building Pangolin - My Holiday Break, an AI IDE, and a Lakehouse Catalog for the Curious

A personal story of how I built Pangolin Catalog over a holiday break using an AI-powered IDE....

2024-11-15 • Alex Merced

Deep Dive into Dremio's File-based Auto Ingestion into Apache Iceberg Tables

Auto ingesting data from JSON, CSV, and Parquet files into Apache Iceberg Tables...

2024-11-05 • Alex Merced

Dremio, Apache Iceberg and their role in AI-Ready Data

The Role of Dremio and Apache Iceberg in AI-Ready Data...

2024-10-31 • Alex Merced

Hands-on with Apache Iceberg & Dremio on Your Laptop within 10 Minutes

How to get hands-on with Apache Iceberg...

2024-10-30 • Alex Merced

Data Modeling - Entities and Events

How to Model Events and Entities...

2024-10-21 • Alex Merced

All About Parquet Part 01 - An Introduction

All about the Apache Parquet File Format...

2024-10-21 • Alex Merced

All About Parquet Part 02 - Parquet's Columnar Storage Model

All about the Apache Parquet File Format...

Categories

data engineering
oltp
database
data
frontend
data lakehouse
Data Engineering
Data Lakehouse
Javascript
Data Architecture
Data Analytics
Devops
Data Modeling
DevOps
python
sql
rust
AI
Apache Iceberg
Software Development
Semantic Layer
copyright 2022 by Alex Merced of alexmercedcoder.dev